AITopics | response policy

8fae6a68aaf1e05bfd90375755b63821-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 15:03:29 GMT

artificial intelligence, bayesian inference, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents

YAN ZHENG, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, Changjie Fan

Neural Information Processing SystemsFeb-13-2026, 13:17:08 GMT

Inmultiagent domains, coping withnon-stationary agents thatchange behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly.

artificial intelligence, machine learning, opponent, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Tianjin Province > Tianjin (0.05)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents

YAN ZHENG, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, Changjie Fan

Neural Information Processing SystemsNov-20-2025, 17:57:46 GMT

There also exist many application scenarios involving multiagent interactions, commonly known as multiagent systems (MAS).

artificial intelligence, opponent, response policy, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Tianjin Province > Tianjin (0.05)
North America > Canada > Quebec > Montreal (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(2 more...)

Industry: Leisure & Entertainment (0.93)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Beyond Single Stationary Policies: Meta-Task Players as Naturally Superior Collaborators

Neural Information Processing SystemsOct-10-2025, 09:29:41 GMT

We provide theoretical guarantees for CBPR's rapid convergence to the optimal policy once human partners alter their policies.

artificial intelligence, bayesian inference, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

STLGame: Signal Temporal Logic Games in Adversarial Multi-Agent Systems

Yang, Shuo, Zheng, Hongrui, Vasile, Cristian-Ioan, Pappas, George, Mangharam, Rahul

arXiv.org Artificial IntelligenceDec-2-2024

We study how to synthesize a robust and safe policy for autonomous systems under signal temporal logic (STL) tasks in adversarial settings against unknown dynamic agents. To ensure the worst-case STL satisfaction, we propose STLGame, a framework that models the multi-agent system as a two-player zero-sum game, where the ego agents try to maximize the STL satisfaction and other agents minimize it. STLGame aims to find a Nash equilibrium policy profile, which is the best case in terms of robustness against unseen opponent policies, by using the fictitious self-play (FSP) framework. FSP iteratively converges to a Nash profile, even in games set in continuous state-action spaces. We propose a gradient-based method with differentiable STL formulas, which is crucial in continuous settings to approximate the best responses at each iteration of FSP. We show this key aspect experimentally by comparing with reinforcement learning-based methods to find the best response. Experiments on two standard dynamical system benchmarks, Ackermann steering vehicles and autonomous drones, demonstrate that our converged policy is almost unexploitable and robust to various unseen opponents' policies. All code and additional experimental results can be found on our project website: https://sites.google.com/view/stlgame

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.01656

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Adaptive Opponent Policy Detection in Multi-Agent MDPs: Real-Time Strategy Switch Identification Using Running Error Estimation

Mridul, Mohidul Haque, Khan, Mohammad Foysal, Rizvee, Redwan Ahmed, Khan, Md Mosaddek

arXiv.org Artificial IntelligenceJun-10-2024

In Multi-agent Reinforcement Learning (MARL), accurately perceiving opponents' strategies is essential for both cooperative and adversarial contexts, particularly within dynamic environments. While Proximal Policy Optimization (PPO) and related algorithms such as Actor-Critic with Experience Replay (ACER), Trust Region Policy Optimization (TRPO), and Deep Deterministic Policy Gradient (DDPG) perform well in single-agent, stationary environments, they suffer from high variance in MARL due to non-stationary and hidden policies of opponents, leading to diminished reward performance. Additionally, existing methods in MARL face significant challenges, including the need for inter-agent communication, reliance on explicit reward information, high computational demands, and sampling inefficiencies. These issues render them less effective in continuous environments where opponents may abruptly change their policies without prior notice. Against this background, we present OPS-DeMo (Online Policy Switch-Detection Model), an online algorithm that employs dynamic error decay to detect changes in opponents' policies. OPS-DeMo continuously updates its beliefs using an Assumed Opponent Policy (AOP) Bank and selects corresponding responses from a pre-trained Response Policy Bank. Each response policy is trained against consistently strategizing opponents, reducing training uncertainty and enabling the effective use of algorithms like PPO in multi-agent environments. Comparative assessments show that our approach outperforms PPO-trained models in dynamic scenarios like the Predator-Prey setting, providing greater robustness to sudden policy shifts and enabling more informed decision-making through precise opponent policy insights.

agent, algorithm, opponent, (15 more...)

arXiv.org Artificial Intelligence

2406.065

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On-the-fly Strategy Adaptation for ad-hoc Agent Coordination

Zand, Jaleh, Parker-Holder, Jack, Roberts, Stephen J.

arXiv.org Machine LearningMar-7-2022

Training agents in cooperative settings offers the promise of AI agents able to interact effectively with humans (and other agents) in the real world. Multi-agent reinforcement learning (MARL) has the potential to achieve this goal, demonstrating success in a series of challenging problems. However, whilst these advances are significant, the vast majority of focus has been on the self-play paradigm. This often results in a coordination problem, caused by agents learning to make use of arbitrary conventions when playing with themselves. This means that even the strongest self-play agents may have very low cross-play with other agents, including other initializations of the same algorithm. In this paper we propose to solve this problem by adapting agent strategies on the fly, using a posterior belief over the other agents' strategy. Concretely, we consider the problem of selecting a strategy from a finite set of previously trained agents, to play with an unknown partner. We propose an extension of the classic statistical technique, Gibbs sampling, to update beliefs about other agents and obtain close to optimal ad-hoc performance. Despite its simplicity, our method is able to achieve strong cross-play with unseen partners in the challenging card game of Hanabi, achieving successful ad-hoc coordination without knowledge of the partner's strategy a priori.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2203.08015

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(2 more...)

Add feedback

Learning Cooperation and Online Planning Through Simulation and Graph Convolutional Network

Mahmud, Rafid Ameer, Faisal, Fahim, Mahmud, Saaduddin, Khan, Md. Mosaddek

arXiv.org Artificial IntelligenceOct-16-2021

Multi-agent Markov Decision Process (MMDP) has been an effective way of modelling sequential decision making algorithms for multi-agent cooperative environments. A number of algorithms based on centralized and decentralized planning have been developed in this domain. However, dynamically changing environment, coupled with exponential size of the state and joint action space, make it difficult for these algorithms to provide both efficiency and scalability. Recently, Centralized planning algorithm FV-MCTS-MP and decentralized planning algorithm \textit{Alternate maximization with Behavioural Cloning} (ABC) have achieved notable performance in solving MMDPs. However, they are not capable of adapting to dynamically changing environments and accounting for the lack of communication among agents, respectively. Against this background, we introduce a simulation based online planning algorithm, that we call SiCLOP, for multi-agent cooperative environments. Specifically, SiCLOP tailors Monte Carlo Tree Search (MCTS) and uses Coordination Graph (CG) and Graph Neural Network (GCN) to learn cooperation and provides real time solution of a MMDP problem. It also improves scalability through an effective pruning of action space. Additionally, unlike FV-MCTS-MP and ABC, SiCLOP supports transfer learning, which enables learned agents to operate in different environments. We also provide theoretical discussion about the convergence property of our algorithm within the context of multi-agent settings. Finally, our extensive empirical results show that SiCLOP significantly outperforms the state-of-the-art online planning algorithms.

artificial intelligence, machine learning, planning & scheduling, (18 more...)

arXiv.org Artificial Intelligence

2110.0848

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents

ZHENG, YAN, Meng, Zhaopeng, Hao, Jianye, Zhang, Zongzhang, Yang, Tianpei, Fan, Changjie

Neural Information Processing SystemsDec-31-2018

In multiagent domains, coping with non-stationary agents that change behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly. This paper studies efficient policy detecting and reusing techniques when playing against non-stationary agents in Markov games. We propose a new deep BPR+ algorithm by extending the recent BPR+ algorithm with a neural network as the value-function approximator. To detect policy accurately, we propose the \textit{rectified belief model} taking advantage of the \textit{opponent model} to infer the other agent's policy from reward signals and its behaviors. Instead of directly storing individual policies as BPR+, we introduce \textit{distilled policy network} that serves as the policy library in BPR+, using policy distillation to achieve efficient online policy learning and reuse. Deep BPR+ inherits all the advantages of BPR+ and empirically shows better performance in terms of detection accuracy, cumulative rewards and speed of convergence compared to existing algorithms in complex Markov games with raw visual inputs.

artificial intelligence, machine learning, opponent, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Tianjin Province > Tianjin (0.05)
North America > Canada > Quebec > Montreal (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(2 more...)

Genre:

Research Report (0.48)
Overview (0.48)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents

ZHENG, YAN, Meng, Zhaopeng, Hao, Jianye, Zhang, Zongzhang, Yang, Tianpei, Fan, Changjie

Neural Information Processing SystemsDec-31-2018

In multiagent domains, coping with non-stationary agents that change behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly. This paper studies efficient policy detecting and reusing techniques when playing against non-stationary agents in Markov games. We propose a new deep BPR+ algorithm by extending the recent BPR+ algorithm with a neural network as the value-function approximator. To detect policy accurately, we propose the \textit{rectified belief model} taking advantage of the \textit{opponent model} to infer the other agent's policy from reward signals and its behaviors. Instead of directly storing individual policies as BPR+, we introduce \textit{distilled policy network} that serves as the policy library in BPR+, using policy distillation to achieve efficient online policy learning and reuse. Deep BPR+ inherits all the advantages of BPR+ and empirically shows better performance in terms of detection accuracy, cumulative rewards and speed of convergence compared to existing algorithms in complex Markov games with raw visual inputs.

artificial intelligence, machine learning, opponent, (16 more...)

Neural Information Processing Systems

Country: